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Abstract. A recurrent neural network is considered that can retrieve a collection of patterns, 
as well as slightly perturbed versions of this 'pure' set of patterns via fixed points of its dynamics. 
By replacing the set of dynamical constraints, i.e., the fixed point equations, by an extended 
collection of fixed-point-like equations, analytical expressions are found for the weights Wij{b) 
of the net, which depend on a certain parameter b. This so-called basin parameter b is such 
that for b = there are, a priori, no perturbed patterns to be recognized by the net. It is shown 
by a numerical study, via probing sets, that a net constructed to recognize perturbed patterns, 
i.e., with values of the connections Wij{b) with b ^ 0, possesses larger basins of attraction than 
a net made with the help of a pure set of patterns, i.e., with connections Wijib = 0). The 
mathematical results obtained can, in principle, be realized by an actual, biological neural net. 

Keywords: recurrent neural network, basin of attraction, margin parameter 



PACS numbers: 84.35+i, 87.10+e 

1. Introduction 

The capacity of a neural network to recognize a pattern that is not precisely equal 
to, but resembles, a given, stored pattern, is characterized by what is called, in a 
mathematical context, the 'basin of attraction' of the stored pattern. If the basin is 
small, the network will be capable only to associate a small set of similar patterns 
to a typical pattern, whereas for a large basin the set of similar patterns that can be 
recognized is large. 

Once a pattern has been presented to a neural network, the neural network starts 
to evolve under the influence of its own internal dynamics. If the network, at the end 
of this process, ends in a unique state this state is called a (single) attractor of the 
network. It is also possible that the network hops between more than one final state, 
in which case one speaks of a multiple attractor [|l], ||, |^. Patterns that evolve to an 
attractor are said to belong to the basin of attraction of this attractor. Many ways of 
characterizing basins are en vogue: basins are said to be deep or shallow and narrow 
or wide Q . 

A way to influence the basins of the attractors is to change the network dynamics, 
switching from deterministic to stochastic dynamics ^ . Another way to change the 
dynamics of the neural network is to vary the connections during the learning stage. 
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The latter possibility can be exploited in a model for an actual, biological system 

We are primarily interested in biological neural networks. Therefore, we are not 
aiming at mathematical problems such as (optimizing) the storage capacity in relation 
to the sizes of the basins of attraction, a subject that has got ample attention in the 
literature (1O[0[|[|0. 

Many dynamical systems are parameterized by a certain constant k, sometimes 
called 'margin parameter'. The margin parameter k is claimed to be related to the 
size of the basins of attraction of the fixed points of the dynamics of a neural network 

[l^ , |l7| , |8[ Naively, one would expect, for reasons that are directly related 
to the way this parameter k is introduced in the model that, the larger k, the larger 
the basins of attraction will be. However, as the 1997 study of Rodrigues Neto and 
Fontanari indicates this may not be true. Their numerical analysis, for tiny networks 
(up to 24 neurons) suggest that the number of attractors increase with increasing k 
and that, perhaps because of this increase, the basins of attraction are not enlarged 
— as one might expect in first instance |po| . In section ^, we arrive at a precise 
interpretation for the margin parameter k, which, usually, is introduced as an ad hoc 
quantity. In section ^, we consider the effect of the margin parameter on the basins for 
a network with 256 neurons. For a basin parameter 6 = 0, we find that the larger k, 
the larger the basins, in agreement with what one would expect naively (see figure ^ 
for b = 0). 

The 1992 study of Wong and Sherrington is also concerned with the sizes of the 
basins of attraction. One of their finding is, roughly speaking, that the noisier the set 
of learning patterns, the larger the basins of attraction [Q. Our findings support their 
observations (see figure ^for 5 > 0). 

We consider a network in its final state only, i.e., after the process of learning 
has stopped. This makes our study time- independent. We try and construct a 
network with weights Wij that can not only store a certain set of p prescribed patterns 
= (Cfj ■ ■ ■ i£.n)^ where fj. = 1, . . . ,p, but that can also remember a larger set of 
patterns, centered around these typical patterns. These enlarged sets, called r2^(6) 
below, are characterized by the basin parameter b mentioned above. If 6 = 0, the 
set r2'^(&) reduces to the sole pattern What we obtain, finally, are values for the 
weights that depend on this basin parameter b: 

i W^J{ta)+V,J{t) (jeVi) 
W^j{to) (j e F/) 



with 

v,,{t)^N-^ j2 [n^i^ib,w,itom'2^^-im\b)r[{i-b)^^ + bii-c^)] (2) 

/A.I/— 1 

[see section |[ eqs. (|3^), (34) and (|37|)]. Here, the Wij{to) are arbitrary numbers, which 
can be interpreted, in a different context, as initial values for the weights, at an initial 
time to, as is suggested by the notation. Furthermore, N is the number of neurons 
of the network. We abbreviated Wi{tf)) (wii(to), • ■ • ,WiN{to))- The quantities 7f 
are defined in ( p^ for arbitrary Wi, and the matrices are defined in (p4|). The 
7f depend on threshold potentials 9i, the basin parameter b and the input patterns 
Vi and are index collections defined in such a way that the weights Wij are 
adaptable if j € 1^ and constant if j € (for all i = 1, . . . , N). If the (constants) Wij 
vanish, there is no connection between i and j. Hence, the Wij {j G V^) determine the 
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network topology. The more Wij {j € V^^) vanish, the more 'diluted' a network. We 
have written Wij{t) for the weights, to facilitate comparison with earlier result (see, 
e.g., |pT|). In the present article they are time independent constants, however. 

The formulae (^-(|^), constituting the main result of this article, generalize well- 
known results for the weights of a recurrent network. The generalizations concerned 
are: the network may not be fully connected, and the weights may depend upon the 
prescribed sizes of the basins, characterized by the basin parameter b. For 6 = we 
recover our earlier result for a diluted network |^ . 

It turns out that in some cases the basins of attraction are larger for values of 
b unequal to zero. In other words, a network which has learned not only a set of 
patterns but a collection of perturbed patterns ^'^{b), will possess larger basins 
of attraction. Hence, a network can optimally recognize perturbed patterns, if it has 
been constructed with perturbed patterns. This is what Wong and Sherrington [Q, in 
a related study, but for a network with connections that are changed during a learning 
process, call the 'principle of adaptation': a neural network is found to perform best 
in an operating environment identical to the training environment. Our analysis of 
the system after the process of learning is completed confirms this observation, albeit 
that the word 'identical' is not to be taken literally. So far our general introduction 
to the problem. We now come to a short overview of our article. 

In section ^ we start by defining mathematically the problem to find suitable 
synaptic weights by formulating the equations to be obeyed by the weights Wij of 



the connections. In section 3.1 and the appendix we indicate how we could obtain, 
in principle, a series expansion in the parameter b for the solution of the equations. 
To actually calculate the first terms of the expansion would be very time consuming. 



We therefore proceed differently. In section 3.2 we rewrite the implicit expression 



found in section 3.1 in such a way, that we can easily find an approximation [see 
eq. (^l|)]. What we essentially do, is to replace in the alternative implicit expression 



found in section 3.2, a certain average 7f related to the i-th neuron potential hi 
threshold 0i and the activity Xj, given explicitly by eq. (|^) below, by one and the 
same constant k. We thus find, by identification of n in an old result and the 
K introduced here, an interpretation for the margin parameter n. Whether this 
replacement of the functions 7^ by one and the same constant makes sense, is studied 
in the next section. In section ^ we introduce a probing set, characterized by a probing 
parameter b. The network's performance, as a function of the basin parameter b, is 
calculated numerically for different values of the probing parameter b. We thus test 
our approximation to the exact solution, and find it to be quite satisfactory. 

2. Mathematical formulation of the problem 

2.1. Equations for the enlarged sets of input patterns 

Consider a recurrent network of N neurons. A neuron i of this network fires {xi = 1) 
if its potential hi = X]/=i wnxi surpasses a certain threshold value 9i {i — 1, . . . , N). 
The dynamics of the network is given by the deterministic equation 

N 

x,{t + M) ^ Q^{Y,waxi{t) ^ e,) {i = l,...,N) (3) 

where 0h is the Heaviside step function: Q^^iz) — 1 for positive z and zero elsewhere. 
The weights of the neurons of the network will be updated simultaneously, i.e., we use 
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parallel dynamics. 

Let us suppose that the network is such that it can store the p patterns . . . 
where ^ is an dimensional vector consisting of zeros and ones. For the ^-th pattern 
we have hi = X^^I^i wa^i , hence the weights wu of this network are constrained by 
the fixed point equations, following from 

JV 



(see, e.g., |21|). 

Once the weights wu occurring in have been determined for chosen collections 
of patterns (f^ , one may ask the question which patterns x, alike but not exactly equal 
to one of the ^'^'s, evolve to the fixed point i.e., what are the basins of attraction 
of the fixed points It is precisely the purpose of this article to study this question 
in some detail. 

The basin of attraction of an attractor of a dynamical system is defined to be 
the collection of vectors that evolve, in one or many steps, to this attractor. We are 
here interested in the question which vectors x, belonging to certain disjunct sets of 
patterns Q.^, centered around typical patterns ^'^ (// — 1, . . . ,p), arrive, in one step 
of the dynamics only, at the fixed point . These latter a;'s belong certainly to the 
basin of attraction as defined above, and will be referred to, for the sake of simplicity, 
as the basin of attraction, although it is only a part, namely, the one step part, of the 
actual basin of attraction. 

In order to take our newly defined basin of attraction into account, wc shall 
replace the requirement (Q) , an equation for the weights wu , by 

N 

eHiJ2^^^^^-^^)=^^ i^i = l,■■■,p■,^ = l,■■■,N) (5) 

1=1 

where the patterns x belong to certain given disjunct sets of patterns fi^, still to 
be specified, centered around typical patterns (m — Ij ■ • ■ jP)- Equation is the 
central equation of this article; we are no longer concerned with the equations 
or (Q). Note that this equation is time- independent; nevertheless, we will indicate 
the final solution for the weights by Wij{t), in order to suggest that these are the 
weights after a period of learning. We shall determine by an (approximating) analytical 
procedure the weights wu such that (||) is probably satisfied for most of the patterns 
X, but not necessarily for all patterns. The latter will depend on the chosen collections 
ri^ (/i = 1, . . . ,p). Having obtained the weights wu, for such a particular choice of 
ri^'s, we shall check by a numerical procedure, whether all x G il^^ actually satisfy 
(||). This will indeed not always, but often, be the case. Thus we shall have obtained 
values for the weights which could be useful for an actual network. 

As stated above, we are not concerned, in this article, with the process via 
which learning takes place, we are only studying the purely mathematical problem 
of finding values for the weights Wij that guarantee storage and retrieval properties 
of a neural net. This leaves us with the question whether the values, given by such a 
dry, mathematical requirement can actually be realized by the wet-ware constituted 
by the neurons and their connections. This point will be the subject of a next article 
p2[ , where it will turn out that a biological system can realize values for the weights 
which very closely approximate the values obtained here: compare formulae (^-(^ 
with (38)-(39) of pi. 
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Distinguishing the cases S.^ — (no neuron activity) and $f = 1 one may verify 
that the equations (^) are fulfilled if and only if 

N 

7f (u;,) := iYwuXi - 9,){2i^ - 1) > V x e fl^' (6) 



1=1 

where we abbreviated Wi := {wn, . . . , win ), and where fl'^ is a collection of patterns 
which will be made explicit in section 2.2. Let p^{x) be the probability of occurrence 
of a pattern x in the set Vl^ of patterns centered around a typical pattern . From 
(||) it follows that the averages 7^ defined as 

N 

j^w,):^ ^ p^(a;)(^w;.iXi-0.)(2ef-l) {fi = 1, . . . ,p;i = I, . . . , N){7) 
xeni^ 1=1 

are also positive, i.e., 

7rK)>0. (8) 

Conversely, the fact that the averages are positive, ■y^ > 0, does not necessarily imply 
that 7^ > (i = 1, . . . , TV; /I = 1, . . . Throughout this article, the averages 7^ will 
play a central role. 

Let be the total number of patterns belonging to any of the collections il^ 
(/X = 1, . . . ,p). Since, in general, the number no is larger than the number p, the set 
of equations (||) will be more restrictive than the set (^. 

In the following, we shall consider biological networks, for which wu = 
(i = 1, . . . , N). Moreover, we shall consider partially connected (or diluted) networks, 
i.e., we allow for the possibility that a particular set of connections Wij vanish. In 
general, we shall suppose that a certain subset of connections w^j have prescribed 
values, which may or may not be zero. In order to formalize this, we introduce the 
sets Vi {i = 1, . . . , N) and their complements Vf: the Vi contain all indices j for which 
Wij is not prescribed, but to be determined via equation (|^), while their complements 
Vf contain all indices j for which Wij have certain prescribed values, which may or 
may not be zero Let the total number of indices j for which Wij {i = 1, . . . , N) 
is prescribed be given by M. Then (||) is a set of Nn^i inequalities to be satisfied by 
the iV^ — M unknown weights Wij . 

Multiplying both sides of (||) by p'-'-{x)xj and summing over fi and x we obtain 

p N p 

fi=ixeni^ 1=1 fj.=ixeni^ 

These are N'^ — M equations for the N'^ — M non-prescribed weights w^, from 
which we want to solve the Wij, once the il'*, or, equivalently, the p'^{x) are 
specified. Notwithstanding the fact that the number of equations equals the number 
of unknowns, the solution of (^ for the weights wu is not unique, because the step 
function 8^f only requires that ^2iLi wuxi — 6i be positive or negative. As a side- 
remark we notice that equation (y) is under-determined for p < N: then there are 
more unknowns wu than equations. 
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2. 2. The distribution of patterns in the basins 

We choose the fonowing, particular, probabihty distribution function 

N 

P^{x)=l[pt{x.) (10) 

i=l 

where 

pf(x0 = (i-fo)4,,cr+6'5,„i-«r (11) 

and where & is a parameter between and 1, which we wih refer to as the 'basin- 
parameter'. 

The sets around the patterns are supposed to be disjunct, and a vector 
X outside U^^^j^ri^ has, by definition, a vanishing probabihty of occurrence. The 
probabihty distribution (pO|)~(pl]), however, yields a finite — albeit it very small — 
probability of occurrence for a vector x outside the direct surrounding fi^ of 
since it is defined for all 2^ possible vectors x. The observation that the probability 
distribution (^0|)- (jl^) for a;'s outside fl'^ is very small allows us to approximate the 
sum of all X £ Uj^^^^^ by the larger sum over all x € {0, 1}^. This approximation 
will enable us to obtain analytical results. 

If 6 = 0, only the patterns x = ^'^ have a non-zero probability of occurrence. For 
values of b close to zero any vector x has a non-zero probability of occurrence, but 
only vectors x close to one of the have a probability of occurrence comparable to 
the probability of occurrence of a typical pattern. Note, that the basin-parameter is 
directly related to the magnitude of fJ^: the larger b, the larger the number of patterns 
in that resemble the pattern Let us denote the average over the patterns as 

x^:= E P'i^)^,- (12) 
xe{o,i}" 



Then, from (|10| ) and (11) we find 



and 



-fj- 



E P^(^) = l (13) 
a;G{o,i}" 

Xj=0,l k=ij Xk=0,l 

= (i_6)e+6(i-en. (w) 



The first equation, eq. (13), expresses the normalization of the probability distribution 
function, the second one, eq. (p^), expresses the fact that the average value of the 
activity of neuron j is a number between and 1 , depending on the basin-parameter 
b. Using (|l|) and @ in (0) yields 

N 

j^{b,w,)^iJ2wuxf-9,){2^^-l) (15) 
1=1 

where b is the basin-parameter and where aif* is given by (p^. The wu occurring in 
this expression still have to be found. 
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3. Solving the equations 

We will now try and solve the problem of finding the weights wu of a recurrent neural 
network, in the approximation dictated by equation combined with the particular 
probability distribution (p^-(pl]), and we hope, thereby, to have obtained a useful 
solution for the problem that we actually want to solve, i.e., the equations or, 
equivalently, (||) for given collections fi^. The question to what extend we will have 
achieved this goal will be answered in section^, where we perform a numerical analysis. 

The analytical approach to the problem to solve (||), an equation for the weights of 
a many neuron recurrent network is an adapted version of the way in which Wiegerinck 
and Coolen calculated the weights for a large perceptron . 



3.1. Implicit equations for the weights 

By substituting (pO|)-(pl]) into equation (||), we can obtain explicit expressions for 
both its left and its right side, and, from these, solve for the weights wu. Using (|l2|), 
we immediately obtain for the right-hand side of (^) 

j2 E p"N^.er = Ec"si' (16) 

where is given by (p^). We turn now to the left-hand side of equation (^, the 
handling of which is more complicated and will be largely done in the appendix. 

We note that if Wij = Wij{di,£^^) is a solution of equation (^) or (||), then also 
Wij{9i,^1) := aiWij{a^^9i,£^^) is a solution of equation (^) or (||), if 0i is replaced by 
0i = aiOi, where is an arbitrary real constant. Using this freedom of gauge with 
Qi — {J2m=i^imy^'^' '^^^ adjust the order of magnitude of the weights and the 
thresholds 

(17) 



Z-^rn=l im y ^^m— 1 im 

which has a consequence that, if Wij and 6i are of the order (jj an arbitrary real 
number), the hatted quantities are small, namely of the order 1/\/N. Note that 

N 

E ^™ = 1 • (18) 

m— 1 

The equations (|lj) and (|l^) enable us to switch, at any moment, from hatted to 
unhatted quantities. The hatted quantities are useful in view of the property (0), a 
property that is used in the appendix. One has, trivially, 

N N 

eH(E^''^' - = 0h(E^*'^' - ■ (19) 

1=1 1=1 
The further evaluation of the left-hand side of (^) in terms of the Wij is rather 
complicated and is given in the appendix. Combining the right-hand side, eq. (|l6|), 
and the left-hand side, eq. (A. 21), we find an implicit equation for the Wij 



^^^■=^''E^f(^)(2Cr-l)5^; (z = l,...,iV;jel/.) (20) 

tl=l 
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where the E'^ given by 



E^{b) = N 



(^r(b,mO)^iexp(-(^r(^^0)V2a) 
E^exp(-(^r(^«'0)V2a) 



(21) 



are positive quantities. In the latter equations we abbreviated a = 6(1 — b) and 
introduced 7^(6, 11;^), quantities hke the 7^, equation 



L5D, of which the precise 

definition is given in the appendix by (A. 17). With (pO|)-(|2lD we have obtained an 
expression for the weights Wij in terms of the ^^{b,Wi), which, in turn, is a given 
function of the weights Wij, the thresholds 9i and the patterns In other words, 
the equations (^0|)-(|2|) are implicit expressions for the weights only. 

We could find explicit expressions for the weights by expanding the 7^(6, 11;^) as 
a power series in the basin parameter b 

f.A'2,2 , 



7, {b,w{) =7, +7, V 



(22) 



Inserting this expansion into (P0[)-(21), using ( A.IC ), and equating equal powers of the 
expansion variable 6, we may obtain explicit expressions for the expansion coefficients 
1^^ (/i — 1, . . . ,p;i = 1, . . . , N; k — 0, 1, 2, ... , 00) of the power series in b, in terms 
of the physical quantities 9i and Wij, where j is restricted to the set Vf^. We thus 
would find an analytical solution of eq. (^. This scheme has been carried out by 
Wiegerinck and Coolen Q for the perceptron. We do not pursue this path for the 
recurrent neural net considered here, but we will use a pragmatic shortcut to arrive 
at an approximate explicit expression instead. This will be done on the basis of an 
alternative implicit expression for the weights (^), to be derived in the next section 
[see eq. ( |27| ) below]. 



3.2. An alternative implicit expression for the weights 



Rewriting (A. 17), we may derive an alternative expression for E^{b). To that end we 
substitute (2C|) into ( |A.17D : 



5] cri^r (6)(2cr-i) = rr(6) 



(23) 



where Cf^ is the symmetric p x p correlation matrix given by 
crib) :=iV-i ^ x^x-^ 



meVi 

with fi, v — 1^ . . . ,p and where 
rf(&) := (6, «,,)-( E 



u;™<-^,)(2^f-l)](2ef-l). 



(24) 



(25) 



From (23) we get, by multiplying both sides by (2^^^ — 1)C^'^ and summing over 
A = 1, . . . ,p. 



EHb) = E r>tmcr\b)r\2^^ ~ i) (26) 

where is the inverse of the matrix C. With (|6|)) we have obtained an alternative 
expression for the E^{b) [see equation (pT])] in terms of the same quantities, namely 
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Wij, , 9i and b. Substitution of this alternative expression ( p6| ) into (20) leads to an 
alternative expression for the Wij with j G V^: 



- e^mi' - i)](2^f - i)(c-\b)rx-< (27) 



In equation (A. 17) we introduced the 7j {b,Wi) as functions of the weights Wij. Here, 
we have found, conversely, the weights in terms of the 7^(6, Wi). By inserting Wij ( p7[ ) 
into ^'^{b,Wi), equation (A. 17), and making use of the definition ( p^ for C'^'^ (b) one 
arrives, indeed, at an identity. In view of (p^), equation ( |27| ) also holds true with all 
hats dropped. 

The 7's occurring in ( p7| ) are given by 
p 



cr{b)m - i)(2er - 1) exp (-(7r(&, «^.))V2^) 



1^=1 
(E 



^Ub,w,)j:^exp{-{^^{b,w,))y2a) 
0(2^-1) 



(28) 



as follows from (^), ( |23| ) and (g5|). The equations ( |27D with ( |28| ) are an implicit 
expression for the weights. Developing the 7's according to (p2|), we might obtain an 
explicit expression for the weights (E^ 



3.1 



just as in section 

The weights Wij have been constructed as a solution of equation (^ , an equation 
which is strongly related to equation (||). Hence, one may expect that, on the average, 
the 7f' s are positive, i.e., 

tib,w,)>0. (29) 

We come now to the shortcut referred to above. Instead of determining the 
coefficients of the expansion ( ^2|) for the 7's, we truncate this expansion after the first 
term. Dropping the hats and writing 

-p,0 _ 



(30) 



for all constant first terms in the expansions (22), we obtain from (27) 

0O(2er-i)](2ef-i) 



(prescribed) 



(31) 



Note that with the choice Wij{to)~ for j G Vi and Wij{to) — Wij (prescribed) for 
j G Vf in our main result, eqs. (|l])-(|2|), the latter equations reduce to the equations 
(^T|). We thus have almost found the main result. The final form (|l|)-(||) is derived in 
section ^, after a numerical analysis of the particular case (^ij) . 

In view of (|29|), we will choose for k, in eq. (^), a certain positive number. 
This approach, in which we replace the constants 7^ by a number to be found by 
(numerical) trial and error, is a priori, rather crude. The usefulness of this way of 
handling will be the subject of the next section. 
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4. Numerical results: probing the basins 

In this section we will study the question regarding the size of the basins of attraction 
induced by the collection of patterns ^^{b). Stated differently, we will determine 
whether the solution ( |3l| ) for the weights gives suitable basins of attraction. More in 
particular, we will search for the optimal values k and b to be taken in (^Tj) . This will 
be done by carrying out a numerical analysis. 

Let us denote, more extensively, the 7^ of equation (||) by 7i(a;, ^f). 
Equation (H), with weights Wij{b) given by equations (|3l|), is satisfied if, for a certain 
pattern x, the -fi{x,w.i{b),^'j') are positive for all i. Therefore, we proceed as follows. 

We construct probes consisting of patterns x centered around the typical patterns 
, and test whether these x^s are recognized by the neural net, i.e., we determine the 
sign of the 7i's for the patterns x of the probe. As a probing set we take patterns which 
are distributed around the typical patterns in the same way as before, namely as 
given by formulae ( [l0| ) and (^l]), but now with the basin-parameter b replaced by a 
parameter b. The latter parameter is dubbed 'probing-parameter'. In general, the 
probing-parameter b used in the test will be unequal to the basin-parameter b used to 
calculate the weights Wij{b). If the probing parameter b vanishes, a probing collection 
ri^(& = 0) consists of precisely one pattern, namely 

In our numerical study, we first picked a certain value for the probing-parameter 
b, thereafter took an x belonging to the probing set i^'^(b) defined by this b, and 
thereupon calculated the 7^(0;, ii;.; (6), ^f), equation (^. We repeated this procedure 
(for fixed b) many times, and then calculated the fraction of a;'s of the probing set for 
which all 7^(0;, 10^(6), ^f) were positive. 

In figure |l|, we have depicted the relative number of a;'s belonging to the basin 
(vertical axis) as a function of the basin-parameter b (horizontal axis). The graphs a, 
b, c and d in figure |^ correspond to four values of the margin parameter k: k = 1, 
K — 2N^^, K — N^^ and k — ^N^^ . All patterns are supposed to have the property 
that an arbitrary chosen has probability a to be equal to 1. This probability a is 
referred to as the mean activity. Note that for random patterns the mean activity is 
given by a = 0.5. Experimentally, however, the mean activity is found to be smaller 
[P4[ . In all graphs we have chosen vanishing prescribed weights, Wij = 0, j € Vf, 
and 6i = N'^ for all z = 1, . . . , iV. That is, we considered diluted networks. More 
specifically, we took, randomly, 20 percent of the weights to belong to the set Vf, 
which corresponds to a dilution d = 0.2. 

Each of the broken lines in the graphs |l|o- |^c? corresponds to a different value of 
the probing parameter b. Going from top to bottom in the four graphs of figure 0, we 
cross curves with a larger and larger probing parameter b. For the smallest possible 
value of the probing-parameter 6, namely b — 0, the probing set reduces to a typical 
pattern It follows from figure |^ (see the upper lines, little diamonds) that the 
fraction of a;'s belonging to a basin equals 1 for a large range of the basin-parameter 
b. As is to be expected, a typical pattern indeed is a fixed point for all values of b 
(up to some upper limit which is larger than 0.3). 

For values of the probing-parameter b close to zero, b = 0.02 say, the fraction of 
x's belonging to a basin equals one for a large range of the basin parameter b (see the 
second curves from above, indicated by little plus signs). As long as the probing-set is 
smaller than the set of patterns which belong to the basin of attraction, the fraction 
remains one. In case this fraction is less than one, the probing-set is larger than the 
set of patterns which form the basins of attraction. Hence, the probing-parameter b 
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Figure 1. Probing of the basins for various values of the margin parameter. 

In the four graphs a, 6, c and d, the fraction of x's with all 7; positive is depicted, vertically, 
for four values of the parameter k occurring in the final expression for the weights (k = 1, 
re = 2N~^, K = and re = ^A'^"^), as a function of the basin parameter b. The six 

broken lines in each of the graphs correspond to difi^erent values of the probing parameter b that 
characterize the sets Q^{b). From top to bottom, in each graph, we have plotted the fraction 
of a;'s with all 7; positive for values of b given by the six numbers 0, 0.02, 0.04, 0.06, 0.08 and 
0.1, respectively. The number of neurons is N = 256, the number of patterns 4 equals p = 32. 
The mean activity is a = 0.2. The dilution of the network is d = 0.2. 

In each of the four graphs, a, b, c and d, that is, for four different values of the margin parameter 
re, there is an interval of values of b for which the fraction of 7's equals one, for a range of values 
of the probing parameter 5. Hence, for probes with 5 in the latter range, the net has values for 
the weights Wij (b) which are such that the net performs optimally. 



can be viewed upon as a measure for the size of the basin of attraction. 

To iUustrate these latter statements we take as an example figure Id. The lines 
6 = and b — 0.02 coincide: they are the straight horizontal line with fraction one. 
For b — 0.04, corresponding to a fraction given by the curve with little squares, the 
fraction rises to one as a function of b. This implies that the size of the basins grows 
as a function of b. For larger values of 6, given by the curves with crosses, triangles 
and asterisks, the fraction also rises as a function of b, up to some value of b, but 
never equals one. So in these cases, the number of elements of the probing sets always 
clearly is larger than the number of elements belonging to the basins. 

Now, we come to the effect of k on the performance of the network. Comparing 
figures la and Id, and looking where the fraction equals one, we discover that for large 
K, b should be small, and vice versa. 

In figure we study for a large value k — 1 and a small value k — ^N^^ of the 
margin parameter what happens when the number of patterns varies from 16 via 32 to 
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64. As before we have taken vanishing prescribed weights, Wij = 0, j e V^", 6i = N' 
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Figure 2. Probing of the basins for various numbers of patterns. 

The fraction of aj's with all 7i positive is depicted, vertically, for three different values p of the 
number of stored patterns, p = 16 (top), p = 32 and p = 64 (bottom), as a function of the basin 
parameter b. In the left-column the margin parameter is chosen large compared to the threshold, 

K = 1, whereas in the right-column k is taken of the order of the threshold, re = ^N~^. The 
six broken lines in each of the graphs correspond to different values of the probing-parameter 
b. From top to bottom in each graph we have plotted tfie fraction of x's with all 7; positive 
for values of b given by 0, 0.02, 0.04, 0.06, 0.08 and 0.1, respectively. The number of neurons is 
A'' = 256, the mean activity is a = 0.2. The dilution of the network is d = 0.2. 
It is seen that for 6 7^ 0, the fraction rises, up to some value of b. Hence, for large re (left column) 
and small re (right column), the net performs better for 6^0, for different values of the number 
of patterns p. 

for alH = 1, . . . , A'', and dilution d = 0.2. We find for k = 1 as well as k = that 

when the number of patterns increases, the size of the basins decreases. But, since the 
curves have a hump, a value for the basin parameter b unequal zero yields a network 
that recognizes a larger part of the probing sets (6) . 
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The final observation relating to figures |^ and ^ reads that, in general, a network 
with weights Wij {b ^ 0) possesses larger basins of attraction than a network with 
weights Wij (6 = 0). 

5. Relation to earlier work 

The above mathematical study has been performed for adaptable weights, Wij, j € Vi, 
to be determined by the equations (^, and prescribed weights, Wij, j S Vf. Let us 
turn to the situation of a neural network that adapts its weights, in the course of 
time, according to some learning rule. In such a network, all weights start, at t = to 
say, with some initial value Wij{t^)). The weights with j £ Vf^ keep their weights 
throughout the learning process, while the weights Wy, with j e Vi, change in the 
course of time. Now, we ask the question whether we can find liij which are such that 
Wij{t) has prescribed values Wij{to), for all i and j, at t = tg, whereas 7f (&, 'Wi{t)) has 
a large probability of being positive. One way to obtain these Wij is via the iWij's that 
are given by the unhatted counterpart of equation (^). In fact, they are given by 

~ i y^vito) + v^j{t) (jeVi) 
yw^jito) (j e F/) 

where 

v,,{t) = w,,{t) ~N-'Y1 E y^^m{hWr\b)^x'; (33) 

in which we have denoted the (unhatted counterparts of) Wij of equation ( p7| ) as 
Wij it) . An alternative way to write equation ( |3^ ) is given by 

v^At) = N-' E [7f(&,«^.(t))-7r(fe,«'.(to))](2er-i)(c-'(&)rs,- (34) 

1 

The weights Wij^ equation (^2[), have been constructed in such a way that 

^^{h,w,{t))=^^{b,w,{t)). (35) 

The latter equation can be verified easily. In fact, inserting (^) with ( p4[ ) into ( p^ 
gives 

7r(6,tb.(t)) = 7r(&,t^»(io)) + E \-i^^b,wm ~ 7r(^«^.(io))](2er - 1) 

Jjli (36) 
x(2ef-l)(C-'(&))'''C^(&) 

where we used the definitions ( |l5|) and (^. Since Ci^{h) is symmetric, the product of 
the matrices C gives a Kronecker delta, which in turn yields (|35|). The property ( ^5|) 
guarantees that when the 'y^{b,Wi{t)) are positive, the 7^ (6, ■u;i(t)) are also positive. 
Using the same shortcut as above, equation (|30|), we obtain 

v.,it)^N-' [n~^^{b,w,ito))m^~im\b)rx^ (3?) 

with x'j given by (|lj). The equations ( ^2| ) with ( |37| ) are equivalent to the main result 
(0)~(|2|) mentioned in the introduction. Putting in this expression the basin parameter 
equal to zero (6 = 0), we recover the expression obtained after a learning process 
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in a preceding article ||2^. This suggest that (^) with ( |37| ) is the generahzation of 
the weights in a process of learning with noisy patterns. Hence, we may state that 
a network performs optimally when trained with noise (67^ 0), or, stated differently 
(and less precise): a neural network performs best in an environment identical to the 
training environment. This is what Wong and Sherrington refer to as the 'principle 
of adaptation' In a next article, we will extensively come back to this question, 
in a biological context The final result will turn out to be that the expression 

( p2| ) with ( |37| ) is, apart from a detail, indeed the generalization of learning with noisy 
patterns. 



6. Conclusion 



Although we studied a neural network, we did not consider learning and learning rules. 
We simply asked the question: what values must one take for the weights of a neural 
network in order that it performs optimally, i.e., that it can retrieve the largest sets 
of perturbed patterns. We were able to reformulate this problem in a mathematically 
exact way, and to obtain a solution that, by its construction, had a certain plausibility 
of being a suitable one. Finally, we performed a numerical test, which confirmed the 
usefulness of our approach. The weights Wij (5) obtained in this article on the basis 
of perturbed data (5^0), yield a network with larger basins than would have been 
obtained in case of non-perturbed data (6 = 0). In a subsequent article we will propose 
a biological learning rule, which is such that, apart from a minor detail, the synapses 
strive at the values given by the main result of this article, equations (|^)-(^. In other 
words, nature might realize almost totally what mathematics suggests. 



Appendix A. Derivation of implicit equations for the weights 



In this appendix we will evaluate the left-hand side of equation (^. Then combining 
this with the result of section 3.1 for the right-hand side will lead to implicit equations 
for Wij . 

Inserting (|l^) into the left-hand side of (H), multiplying by a delta function 
containing a variable z and integrating over z, we get the equivalent expression 

^ ^ ... ^ p^(a;) dz XjQ-a{wi]Xj - di + z)S[z -^^Wtixi] 



fj, xi—0,1 a^jv— 0,1 



= ^jdz ^p'||{xj)x■j&Hiw^jX■j -6^ + z)P!'j{z) 
where we used ( p^ ) and where we abbreviated 

^'(^) - E • • • E E • • • E n p'n.Msiz - J2 ^uxi] 



(A.l) 



(j e V,) . (A.2) 



The summation over Xj in ( A.l ) yields 

'^p'^{xj)xjQH{wijXj - 9i + z) = x'^Q}i{wi 



(A.3) 



as follows by inserting (pT|). The factor Pl^j{z) can be rewritten in the following way. 
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Using a well-known representation of the delta-function we first obtain 
1 f°° 



One has 

^p$'„(x™)e-*'=*""^'" = (1 - 6)e-^'='^™«" +6e-*'=™""(i-«") 



(A.5) 



where we used (|l l|) . Inserting ( [A.5| ) into (A.4) we may write 

/^^'(z) ^^Jdk exp{i/fcz + E In [(1 - 6)e-^*^'^™«'" -I- 6e^*fc»""(i-C)]} (A.6) 



where we used y = expjlny}. We can now expand the two exponentials occurring in 
the argument of the logarithm. This leads to a term of the form In {1 + y). Thereupon, 
we can expand this term as y — ^y"^ + . . ., since y is of the order of Wij, and Wij is of 
the order N~^/'^, as noted above [see eqs. ( p7| ) and following text]. Thus we obtain 

In [(1 - 6)e-''='^'"«" + 6e-*'='^™(i-«")] = -iA:u)„„< - ^6(1 - b)ewl^ + . . .(A.7) 



Inserting (A.7) into ( |A.6| ) we may write 

J^^W = ^exp{-(z-zo)VM y dfcexp{--(fc-z(z-zo)/a)'} + ... (A.; 



where we abbreviated 
a 6(1-6) 

Using the fact that Wij is of the order 1 / Vn we may write 
(7 = 6(1-6) 



(A.9) 



(A.IO) 



a relation we will use later. After evaluating the integral (A. 8), we obtain 



PtAz) = (27ra)-5exp{-(z-zo)V2a} + 



{i^l,...,N;j eVO (A.ll) 



with /i = 1, . . . Substituting (|A.3| ) and (A.ll) into the right-hand side of (A.l) we 
obtain for the left-hand side of (E 



(27rCT)"^ ^ dzT'^Quiwij - 0i + z) exp{-(z - zo)^/2a} . 

The integral occurring in ( A.12| ) can be rewritten 

:= (27r(7)-5 jdzQn{w,j-0, + z)eyij>{~{z^zof/2a}. 
Changing the integration variable z according to y = (z — zo)/\/2ct, we find 



(A.12) 



(A.13) 



K = IT dyOaimj - + Z0 + V2ay)e 



TT^W dye-^y' + {Airyi dy [sgn{mj - 9, + zq - V2ay) 
Jo Jo 

+sgniw^j -9, + zq + \/2^y)]e-^' . (A.14) 
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The integral over the first term is a Gaussian integral, the second term can be expressed 
in an error function. We obtain 
1 1 



Itj = 2 + 2^^f([^^(^' - 1) + er,]/^^) i^ ^ I, ■ ■ ■ , N; j e V,) (A.15) 

where /i = 1, . . . ,p and where the error function is defined according to 
2 



erf (a;) 



dye ^ 



In analogy to ( p^ ) we defined 



N 



\b,w,)^ij2wuxf-9,m';-i) 



1=1 

Furthermore, we abbreviated 



4 = -Wy-aij' + 



(A.16) 



(A.17) 



(A.18) 



Note that, apart from a (f^ dependent factor, the quantity e^- equals the weight w> 



In view of (|17]), e^/v2(T is small. The error function in (A.15) can be split into two 
contributions. For small e we have 

/'7+e 



dye 



-y 



ee 



-7 



7 



which allows us to write for ( A.15 ) 



1 1 



+ -eTiif:{b,w,m^-l)/V2a) + 



(A.19) 



:exp(-(^r(6,u;,))V2cT) + ...(A.20) 



Using (A. 12) and ( A.20| ) with (A.17), the final expression for the left-hand side of 
can be obtained 



Wx^^\\^erm{b,w,)(2i>t-\)l./2^)\ 



(1 - b)w,^ 



^exp(-(^r(^«'.))V2a)(A.21) 



Combining the right- and left-hand sides of equation (^), as given by (|l|) and ( |A.2lD , 
respectively, we get an equation from which the weights Wij follow immediately 

„ _ V2^E^^,^[(2ef - 1) - erf(^r(^^^.)(2er - 



2&(l-&)EMCxp(-(7r(^«'0)V2a) 



With the properties 



erf(^r(6,t.0(2C'^ - 1)/V2a) = {2^'^ l)erf (^r(&, «;,)/V2a) 



and 



erf (y) = 1 - 



1 



we can rewrite ( A.22| ), 
V27rCT 



(A.22) 

(A.23) 
(A.24) 



26(1 - b) 



X exp (-(^f (6, w,)fl2o)l ^ exp (-(^^ w^)f l^o) 
or, equivalently, ( pO| ) with (pT[), the final results to be obtained in this appendix. 



(A.25) 
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